-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#560: Unable to parse millisecond unix timestamps #561
Conversation
Circling back to this. This seems like a reasonable approach to me. |
|
||
Timestamp => { | ||
let next_size = get_fixed_item_len(next_item).unwrap_or(0); | ||
(s.len() - next_size, false, Parsed::set_timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using s.len()
here is problematic for parsing any formats where the timestamp does not terminate at the end of the string.
For example, I have seen a format that is like %s%3f%z
- so 123456789+0000
- in this case you hit %s
consuming all of 123456789
because it is limited to the max of 123456789+0
.
Another case I've seen is Date(%s%3f)
- so Date(123456789)
- in this case you hit %s
consuming all of 1234567
because that is the max, but %3f
is left with three extra characters.
It seems the only real way would be to do something like scan::number
to read however many chars Timestamp
would capture and then go from there. Sadly, the return of scan::number
doesn't currently include the number of characters read, instead returning a new slice pointer so that is not a trivial fix.
Alternatively you could do something like this:
(s.len() - next_size, false, Parsed::set_timestamp) | |
let numeric_bytes_available = | |
s.as_bytes().iter().take_while(|&&c| b'0' <= c && c <= b'9').count(); | |
(numeric_bytes_available - next_size, false, Parsed::set_timestamp) |
But it seems like leaking the implementation details of scan::number
that shouldn't really be leaked here.
I don't know if it's really necessary to fix this for those formats, if they should just be a separate issue, or if they should just not be fixed by the library and left to consumers to implement tricky logic.
I agree this is worth fixing. Combining An alternative is to add new specifiers to allow parsing millisecond, microsecond, and nanosecond timestamps as a single integer. @djc any preferences? Also @meiamsome raised a good issue about the implementation here. @jgoday This PR is near the end of its second year... Would you still be available to push it forwards? |
The approach in this PR seems wordt persuing to me, but in a more general way. It would be nice to be able to parse |
Given the lack of response from the original submitter, going to just close this for now. IMO the original issue isn't a high priority. |
This is a worthy improvement. |
Worthy how? Have you personally had a use case for this? Given the relative lack of resources I would like to avoid adding features people just think of. |
The lack of support for parsing string millisecond timestamps does come up in https://vector.dev/ occasionally, where we use |
@jszwedko please open a new issue explaining your use case. |
Okay, I'm open to reviewing a new PR that takes into account the feedback on this PR so far. |
Thanks for contributing to chrono!
about adding the PR number)
we can't reintroduce it?
This PR tries to solve #560.
When parsing Timestamps, it allows to check the next item fixed length to avoid Timestamp consuming all the remain digits (see #560 example with nanoseconds "%s%3f").
It groups items in tuples (current and next item) inside parse_internal, using slice windows.